AIML Knowledge Base Construction from Text Corpora

نویسندگان

  • Giovanni De Gasperis
  • Isabella Chiari
  • Niva Florio
چکیده

Text mining (TM) and computational linguistics (CL) are computationally intensive fields where many tools are becoming available to study large text corpora and exploit the use corpora for various purposes. In this chapter we will address the problem of building conversational agents or chatbots from corpora for domain-specific educational purposes. After addressing some linguistic issues relevant to the development of chatbot tools from corpora, a methodology to systematically analyze large text corpora about a limited knowledge domain will be presented. Given the Artificial Intelligence Markup Language as the ”assembly language” for the artificial intelligence conversational agents we present a way of using text corpora as seed from which a set of ”source files” can be derived. More specifically we will illustrate how to use corpus data to extract relevant keywords, multiword expressions, glossary building and text patterns in order to build an AIML knowledge base that could be later used to build interactive conversational systems. The approach we propose does not require deep understanding techniques for the analysis of text. As a case study it will be shown how to build the knowledge base of an English conversational agent for educational purpose from a child story that could answer question about characters, facts and episodes of the story. A discussion of the main linguistic and methodological issues and further improvements is offered in the final part of the chapter.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ALICE Chatbot: Trials and Outputs

A chatbot is a conversational agent that interacts with users using natural language. Multi chatbots are available to serve in different domains. However, the knowledge base of chatbots is hand coded in its brain. This paper presents an overview of ALICE chatbot, its AIML format, and our experiments to generate different prototypes of ALICE automatically based on a corpus approach. A descriptio...

متن کامل

PyGenbot for IoT: a demonstration of how to generate any restricted stateless AIML FAQ-chatter bot from text files

Internet of things applications (IoT) are required to interact with the user in the best natural possible way; the voice based conversation is the ultimate human-machine interaction in terms of easy to use and requirements from the user part, which also has the advantage for the user to interact hands free, non necessary watching a computer screen. Chatter bots are conversational agents that si...

متن کامل

Using dialogue corpora to train a chatbot

This paper presents two chatbot systems, ALICE and Elizabeth, illustrating the dialogue knowledge representation and pattern matching techniques of each. We discuss the problems which arise when using the Dialogue Diversity Corpus to retrain a chatbot system with human dialogue examples. A Java program to convert from dialog transcript to AIML format provides a basic implementation of corpusbas...

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

KELVIN: a tool for automated knowledge base construction

We present KELVIN, an automated system for processing a large text corpus and distilling a knowledge base about persons, organizations, and locations. We have tested the KELVIN system on several corpora, including: (a) the TAC KBP 2012 Cold Start corpus which consists of public Web pages from the University of Pennsylvania, and (b) a subset of 26k news articles taken from English Gigaword 5th e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013